home *** CD-ROM | disk | FTP | other *** search
-
- HTTrack version 3.23+swf (compiled Mar 8 2003)
- usage: C:\PROGRA~1\MOZILLA.ORG\MOZILL~1\CHROME\SPIDER~1\CONTENT\HTTRACK\HTTRACK.EXE <URLs> [-option] [+<FILTERs>] [-<FILTERs>]
- with options listed below: (* is the default value)
-
- General options:
- O path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path <param>)
-
- Action options:
- w *mirror web sites (--mirror)
- W mirror web sites, semi-automatic (asks questions) (--mirror-wizard)
- g just get files (saved in the current directory) (--get-files)
- i continue an interrupted mirror using the cache (--continue)
- Y mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)
-
- Proxy options:
- P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy <param>)
- %f *use proxy for ftp (f0 don't use) (--httpproxy-ftp[=N])
-
- Limits options:
- rN set the mirror depth to N (* r9999) (--depth[=N])
- %eN set the external links depth to N (* %e0) (--ext-depth[=N])
- mN maximum file length for a non-html file (--max-files[=N])
- mN,N2 maximum file length for non html (N) and html (N2)
- MN maximum overall size that can be uploaded/scanned (--max-size[=N])
- EN maximum mirror time in seconds (60=1 minute, 3600=1 hour) (--max-time[=N])
- AN maximum transfer rate in bytes/seconds (1000=1KB/s max) (--max-rate[=N])
- %cN maximum number of connections/seconds (*%c10) (--connection-per-second[=N])
- GN pause transfer if N bytes reached, and wait until lock file is deleted (--max-pause[=N])
-
- Flow control:
- cN number of multiple connections (*c8) (--sockets[=N])
- TN timeout, number of seconds after a non-responding link is shutdown (--timeout)
- RN number of retries, in case of timeout or non-fatal errors (*R1) (--retries[=N])
- JN traffic jam control, minimum transfert rate (bytes/seconds) tolerated for a link (--min-rate[=N])
- HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow (--host-control[=N])
-
- Links options:
- %P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
- n get non-html files 'near' an html file (ex: an image located outside) (--near)
- t test all URLs (even forbidden ones) (--test)
- %L <file> add all URL located in this text file (one URL per line) (--list <param>)
- %S <file> add all scan rules located in this text file (one scan rule per line) (--urllist <param>)
-
- Build options:
- NN structure type (0 *original structure, 1+: see below) (--structure[=N])
- or user defined structure (-N "%h%p/%n%q.%t")
- LN long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 compatible) (--long-names[=N])
- KN keep original links (e.g. http://www.adr/link) (K0 *relative link, K absolute links, K4 original links, K3 absolute URI links) (--keep-links[=N])
- x replace external html links by error pages (--replace-external)
- %x do not include any password for external password protected websites (%x0 include) (--no-passwords)
- %q *include query string for local files (useless, for information purpose only) (%q0 don't include) (--include-query-string)
- o *generate output html file in case of error (404..) (o0 don't generate) (--generate-errors)
- X *purge old files after update (X0 keep delete) (--purge-old[=N])
- %p preserve html files 'as is' (identical to '-K4 -%F ""') (--preserve)
-
- Spider options:
- bN accept cookies in cookies.txt (0=do not accept,* 1=accept) (--cookies[=N])
- u check document type if unknown (cgi,asp..) (u0 don't check, * u1 check but /, u2 check always) (--check-type[=N])
- j *parse Java Classes (j0 don't parse) (--parse-java[=N])
- sN follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always) (--robots[=N])
- %h force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (--http-10)
- %k use keep-alive if possible, greately reducing latency for small files and test requests (%k0 don't use) (--keep-alive)
- %B tolerant requests (accept bogus responses on some servers, but not standard!) (--tolerant)
- %s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
- %A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume <param>)
- shortcut: '--assume standard' is equivalent to -%A php2,php3,php4,php,cgi,asp,jsp,pl,cfm=text/html
- @iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only) (--protocol[=N])
-
- Browser ID:
- F user-agent field (-F "user-agent name") (--user-agent <param>)
- %F footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer <param>)
- %l preffered language (-%l "fr, en, jp, *" (--language <param>)
-
- Log, index, cache
- C create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (--cache[=N])
- k store all files in cache (not useful if files on disk) (--store-all-in-cache)
- %n do not re-download locally erased files (--do-not-recatch)
- %v display on screen filenames downloaded (in realtime) - * %v1 short version (--display)
- Q no log - quiet mode (--do-not-log)
- q no questions - quiet mode (--quiet)
- z log - extra infos (--extra-log)
- Z log - debug (--debug-log)
- v log on screen (--verbose)
- f *log in files (--file-log)
- f2 one single log file (--single-log)
- I *make an index (I0 don't make) (--index)
- %I make an searchable index for this mirror (* %I0 don't make) (--search-index)
-
- Expert options:
- pN priority mode: (* p3) (--priority[=N])
- p0 just scan, don't save anything (for checking links)
- p1 save only html files
- p2 save only non html files
- *p3 save all files
- p7 get html files before, then treat other files
- S stay on the same directory (--stay-on-same-dir)
- D *can only go down into subdirs (--can-go-down)
- U can only go to upper directories (--can-go-up)
- B can both go up&down into the directory structure (--can-go-up-and-down)
- a *stay on the same address (--stay-on-same-address)
- d stay on the same principal domain (--stay-on-same-domain)
- l stay on the same TLD (eg: .com) (--stay-on-same-tld)
- e go everywhere on the web (--go-everywhere)
- %H debug HTTP headers in logfile (--debug-headers)
-
- Guru options: (do NOT use if possible)
- #X *use optimized engine (limited memory boundary checks) (--fast-engine)
- #0 filter test (-#0 '*.gif' 'www.bar.com/foo.gif') (--debug-testfilters <param>)
- #C cache list (-#C '*.com/spider*.gif' (--debug-cache <param>)
- #f always flush log files (--advanced-flushlogs)
- #FN maximum number of filters (--advanced-maxfilters[=N])
- #h version info (--version)
- #K scan stdin (debug) (--debug-scanstdin)
- #L maximum number of links (-#L1000000) (--advanced-maxlinks)
- #p display ugly progress information (--advanced-progressinfo)
- #P catch URL (--catch-url)
- #R old FTP routines (debug) (--debug-oldftp)
- #T generate transfer ops. log every minutes (--debug-xfrstats)
- #u wait time (--advanced-wait)
- #Z generate transfer rate statictics every minutes (--debug-ratestats)
- #! execute a shell command (-#! "echo hello") (--exec <param>)
-
- Command-line specific options:
- V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
- %U run the engine with another id when called as root (-%U smith) (--user <param>)
-
- Details: Option N
- N0 Site-structure (default)
- N1 HTML in web/, images/other files in web/images/
- N2 HTML in web/HTML, images/other in web/images
- N3 HTML in web/, images/other in web/
- N4 HTML in web/, images/other in web/xxx, where xxx is the file extension (all gif will be placed onto web/gif, for example)
- N5 Images/other in web/xxx and HTML in web/HTML
- N99 All files in web/, with random names (gadget !)
- N100 Site-structure, without www.domain.xxx/
- N101 Identical to N1 exept that "web" is replaced by the site's name
- N102 Identical to N2 exept that "web" is replaced by the site's name
- N103 Identical to N3 exept that "web" is replaced by the site's name
- N104 Identical to N4 exept that "web" is replaced by the site's name
- N105 Identical to N5 exept that "web" is replaced by the site's name
- N199 Identical to N99 exept that "web" is replaced by the site's name
- N1001 Identical to N1 exept that there is no "web" directory
- N1002 Identical to N2 exept that there is no "web" directory
- N1003 Identical to N3 exept that there is no "web" directory (option set for g option)
- N1004 Identical to N4 exept that there is no "web" directory
- N1005 Identical to N5 exept that there is no "web" directory
- N1099 Identical to N99 exept that there is no "web" directory
- Details: User-defined option N
- '%n' Name of file without file type (ex: image)
- '%N' Name of file, including file type (ex: image.gif)
- '%t' File type (ex: gif)
- '%p' Path [without ending /] (ex: /someimages)
- '%h' Host name (ex: www.someweb.com)
- '%M' URL MD5 (128 bits, 32 ascii bytes)
- '%Q' query string MD5 (128 bits, 32 ascii bytes)
- '%q' small query string MD5 (16 bits, 4 ascii bytes)
- '%s?' Short name version (ex: %sN)
- '%[param]' param variable in query string
- '%[param:before:after:notfound:empty]' advanced variable extraction
- Details: User-defined option N and advanced variable extraction
- %[param:before:after:notfound:empty]
- param : parameter name
- before : string to prepend if the parameter was found
- after : string to append if the parameter was found
- notfound : string replacement if the parameter could not be found
- empty : string replacement if the parameter was empty
- all fields, except the first one (the parameter name), can be empty
-
- Details: Option K
- K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default)
- K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL) (--keep-links[=N])
- K4 -> foo.cgi?q=45 (original URL)
- K3 -> /folder/foo.cgi?q=45 (absolute URI)
-
- Shortcuts:
- --mirror <URLs> *make a mirror of site(s) (default)
- --get <URLs> get the files indicated, do not seek other URLs (-qg)
- --list <text file> add all URL located in this text file (-%L)
- --mirrorlinks <URLs> mirror all links in 1st level pages (-Y)
- --testlinks <URLs> test links in pages (-r1p0C0I0t)
- --spider <URLs> spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
- --testsite <URLs> identical to --spider
- --skeleton <URLs> make a mirror, but gets only html files (-p1)
- --update update a mirror, without confirmation (-iC2)
- --continue continue a mirror, without confirmation (-iC1)
-
- --catchurl create a temporary proxy to capture an URL or a form post URL
- --clean erase cache & log files
-
- --http10 force http/1.0 requests (-%h)
-
-
- example: httrack www.someweb.com/bob/
- means: mirror site www.someweb.com/bob/ and only this site
-
- example: httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
- means: mirror the two sites together (with shared links) and accept any .jpg files on .com sites
-
- example: httrack www.someweb.com/bob/bobby.html +* -r6
- means get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web
-
- example: httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080
- runs the spider on www.someweb.com/bob/bobby.html using a proxy
-
- example: httrack --update
- updates a mirror in the current folder
-
- example: httrack
- will bring you to the interactive mode
-
- example: httrack --continue
- continues a mirror in the current folder
-
- HTTrack version 3.23+swf (compiled Mar 8 2003)
- Copyright (C) Xavier Roche and other contributors
-